Development of a Cardiovascular Disease Risk Prediction Model: A Preliminary Retrospective Cohort Study of a Patient Sample in Saudi Arabia

Saudi Arabia has an alarmingly high incidence of cardiovascular disease (CVD) and its associated risk factors. To effectively assess CVD risk, it is essential to develop tailored models for diverse regions and ethnicities using local population variables. No CVD risk prediction model has been locally developed. This study aims to develop the first 10-year CVD risk prediction model for Saudi adults aged 18 to 75 years. The electronic health records of Saudi male and female patients aged 18 to 75 years, who were seen in primary care settings between 2002 and 2019, were reviewed retrospectively via the Integrated Clinical Information System (ICIS) database (from January 2002 to February 2019). The Cox regression model was used to identify the risk factors and develop the CVD risk prediction model. Overall, 451 patients were included in this study, with a mean follow-up of 12.05 years. Thirty-five (7.7%) patients developed a CVD event. The following risk factors were included: fasting blood sugar (FBS) and high-density lipoprotein cholesterol (HDL-c), heart failure, antihyperlipidemic therapy, antithrombotic therapy, and antihypertension therapy. The Bayesian information criterion (BIC) score was 314.4. This is the first prediction model developed in Saudi Arabia and the second in any Arab country after the Omani study. We assume that our CVD predication model will have the potential to be used widely after the validation study.


Introduction
Cardiovascular disease (CVD) is one of the most common non-communicable diseases and the main cause of death worldwide [1,2]. The burden of CVD is increasing in prevalence in developing countries [1]. In the Saudi population, the estimated prevalence of cardiovascular disease (CVD) is approximately 5.5% [3]. More than 50% of CVD mortality was estimated to be caused by the main modifiable risk factors, namely hypertension, diabetes mellitus, hyperlipidemia, obesity, and smoking [4]. In a World Health Organization report, it is estimated that about 37% of deaths from non-communicable diseases in all ages are caused by CVD in Saudi Arabia [5].
The prevalence of diabetes mellitus and hypertension among the Saudi population is 10.1% and 13.5%, respectively [6,7]. It is also estimated that the percentage of smokers has reached 14% of the total population (15 years and above) in Saudi Arabia [8]. In the United Arab Emirates, 28.4% of the population was found to have a Framingham risk score >20% in one cross-sectional community-based study [9].
Current recommendations on the prevention of CVD focus on the need to reduce the total cardiovascular risk of an individual rather than the presence of any particular risk factor [10,11]. For this reason, estimating the risk of cardiovascular events using statistical equations has drawn the interest of many researchers in the last few decades. Multiple risk prediction models and charts have been developed and utilized in clinical practice for the prevention, early detection, and management of CVD. Examples of prediction models include the atherosclerotic cardiovascular disease (ASCVD) risk calculator recommended by the American College of Cardiology/American Heart Association, Framingham risk assessment score, and QRISK assessment score that was updated in 2017 [11][12][13]. There is a significant need for prediction models that target Arab populations [14]. Recently, the first Arabic model was developed and validated, specifically for Omani individuals with type 2 diabetes mellitus, based on a retrospective cohort study with a sample size of 2039 patients [15,16].
Recent evidence has shown that using risk prediction models leads to better outcomes in risk management and prevention. Optimal CVD risk assessment for individuals within a specific population requires the development of different risk assessment models specific to different regions and ethnicities based on variables measured from these local populations. One model cannot accurately estimate CVD risk in different populations [17].
In view of the fact that Saudi Arabia has an alarmingly high incidence of CVD and its associated risk factors, and that, to the best of our knowledge, no CVD risk prediction model has been locally developed, it is vitally important that a specific risk assessment tool be created for the Saudi population. Such a model will help shape local CVD prevention and management strategies. To fulfill this need, this study was initiated, with the aim of developing the first 10-year CVD risk prediction model for Saudi adults aged 18 to 75 years.

Sample Size
Our sample size estimation was based on a comparison between CVD in the diabetic and non-diabetic groups. According to the literature, the two-year event rate for CVD is 10% among non-diabetics and about 45% among diabetics [18]. The total number of subjects that we needed to recruit in the study to detect a hazard ratio (HR) of 3 (this is the null hypothesis) for a type I error rate of 5% and power of 80% was 350 subjects [19]. HR is a measure of how often a particular event happens in one group compared to how often it happens in another group over time.

Variables
The collected data included the following variables (listed in Appendix A): demographics such as age, gender, region of residence, marital status, smoking history, and employment status. Additionally, the average height, weight, and body mass index (BMI), as well as the average of 5 readings from different years of blood pressure, lipid profile, fasting glucose, hemoglobin A1C, and estimated glomerular filtration rate (eGFR), were collected.
During the chart review, any confirmed physician diagnosis of the following diseases at any point in the follow-ups was also recorded: hypertension, diabetes mellitus, dyslipidemia, heart failure, rheumatoid arthritis, atrial fibrillation, albuminuria, and chronic kidney disease (CKD). In addition, any history of premature (women less than 65 years and men less than 55 years) cardiovascular events in a first-degree relative, which includes parents, offspring, and siblings, was noted. Information about any medications used during the follow-up period, including antihypertensive, antihyperlipidemic, antidiabetic, or antithrombotic drugs, was collected.
The outcome would be defined as the first fatal or non-fatal CVD event confirmed and recorded by a physician, including the following: coronary heart disease (stable angina, unstable angina, or acute myocardial infarction) and stroke (ischemic or hemorrhagic). Any patient with a confirmed diagnosis of CVD, heart failure, or end-stage renal disease prior to 2002 was excluded.

Statistical Analysis
The statistical analyses of the data were carried out with a combination of the following tools: JMP version 14.0 (Cary, NC, USA) and Stata version 17.0 (College Station, TX, USA). Categorical variables were presented as proportions. Continuous variables were expressed as means and standard deviations (SDs). The level of statistical significance was set at p < 0.05. We fit Cox survival analyses to find the best model based on the Akaike information criterion (AIC) and the Bayesian information criterion (BIC), which assess and compare the performance and parsimony of competing models. Both the AIC and the BIC are information theory-based measures for model selection and are commonly used. Typically, the two measures agree with each other for model selection. They balance the two features of bias and variance. They differ quantitatively in terms of an added penalty for the BIC. Other model selection techniques (e.g., Lasso) were not considered with the assumption that they would not add significantly to the results [20,21]. Cox regression modeling was used to identify independent risk factors associated with CVD and to develop the CVD risk prediction model using the manual addition and deletion method. The missing data were handled using the complete case analysis (CCA) method.

Construction of the Model
The Cox regression model was used to identify the associated risk factors with CVD and to develop the CVD risk prediction model. Univariate analysis for all 32 variables was done to determine which risk factor would be included in the model; variables that tended to be significant were taken to create a multivariate model. More than 10 multivariate models were created. To select the best-fitting model, the Bayesian Information Criterion (BIC) was used in which a lower BIC value indicates a better model. The final model included 6 independent risk factors (i.e., FBS, HDL-c, heart failure, antihyperlipidemic therapy, antithrombotic therapy, and antihypertension therapy), and the BIC was 314.4.

Scoring System
In this study, longitudinal data were gathered from 451 patients from a family medicine outpatient service. To facilitate the use of the prediction model in daily practice, a point system was formulated. This system is based on the methods of Sullivan et al. [22]. The categorization of the continuous variable was guided by clinical significance, with the reference value determined as the mid-point for each category. The remaining risk factors were modeled using sets of indicator variables (0,1). The referent risk factor profile was chosen to be an individual with FBS of 5.6 and total HDL-c of 2, without a history of heart failure, no antihyperlipidemic therapy, no antihypertensive, and no antithrombotic. The inter-category distances were determined in terms of regression units for each risk factor. A constant was applied to each inter-category distance in order to derive a point and determine the risk estimate (probability of developing an event over the predetermined time frame) based on the total points across the risk factors. This constant will reflect an increase in the risk associated with one unit increase in the FBS point. The derived point will be rounded to a whole number. The theoretical range of this point system will range from 0 to 47.

Results
Between 2002 and 2012, a total of 451 Saudi male and female patients who were seen in the Family Medicine & Polyclinics Department at King Faisal Specialist Hospital and Research Centre (KFSH&RC) in Riyadh were reviewed retrospectively. Table 1 displays the distribution of risk factor characteristics among the sample at baseline. The mean age was 43.9 years, and 35 patients developed CVD events during the study period. The mean FBS at baseline was 6.15 mmol/L. The majority of the studied patients were non-smokers and had no family history of premature CVD.  Table 2 presents the frequency and percentage of the lipid panel results. Over a quarter (26.14%) of the patients had borderline LDL-c levels, while the majority (58.41%) had normal HDL-c levels. Nearly three quarters (72.27%) had normal triglyceride levels.
Based on a univariate analysis of 32 clinically relevant variables (Table S1), six variables were found to be significantly associated with CVD events (p < 0.05) and included in the best-fitting multivariate model presented in Table 3. The predictors of CVD were FBS, HDLc, heart failure, antihyperlipidemic therapy, antithrombotic therapy, and antihypertensive therapy. The table presents the coefficients (also known as betas) of the Cox proportional hazards model, along with the means (or proportions positive for each risk-factor category). In Table 3, the beta value represents the estimated regression coefficient for each predictor of CVD. A positive beta value indicates that an increase in the predictor variable is associated with an increased risk of developing CVD, and vice versa. Heart failure, antihyperlipidemic therapy, antithrombotic therapy, and antihypertensive therapy were analyzed as timevarying covariates, and the proportions considered that the occurrence of the covariate (e.g., heart failure) might happen after the cardiovascular event and therefore should not be counted. The average 10-year event-free rate was 94.5%. During the 10-year follow-up, 35 (7.7%) of the 451 participants developed cardiovascular events (as shown in Figure 1).    Table 4 presents the points assigned to the variables used to estimate the multiv ate risk of CVD. An illustration of using the point system is provided in Appendix and the risk estimation with corresponding points is shown in Table 5.  Table 4 presents the points assigned to the variables used to estimate the multivariate risk of CVD. An illustration of using the point system is provided in Appendix B, and the risk estimation with corresponding points is shown in Table 5.

Discussion
This is the first CVD risk prediction tool in Saudi Arabia. No previous CVD risk prediction tool has been developed specifically for the Saudi Arabian population. The cumulative incidence was 7.7% in this study. Accurate assessment of cardiovascular risk is essential to effectively weigh the risks and benefits of treatment. The American College of Cardiology's ASCVD risk assessment tool and the Framingham calculator are well-trusted and validated tools universally, but they are more accurate when used for the population they were developed for. Both tools have been found to significantly overestimate cardiovascular risk in multi-ethnic cohorts of patients [23]. The Korean heart study included 200,000 Korean adults [24]. They also found that the American College of Cardiology's ASCVD equations overestimated ASCVD risk in Korea, and that the Korean risk prediction model showed the best predictive capability for cardiovascular risk. The ACC calculator was derived from patient cohorts in the 1970s and 1980s, which may be another reason for overestimation in this cohort [25]. Therefore, we developed the fundamental cornerstone of a CVD risk prediction tool for the Saudi Arabian population. Herein, it has been developed to pave the way for similar studies.
The Framingham heart study was initially conducted on 5209 patients over a 6-year interval. The included risk factors comprised age, gender, blood pressure, LDL, and HDL cholesterol, smoking, glucose status, and cardiac enlargement [26]. The ASCVD risk calculator included 13 predictors; furthermore, the newest version of the UK Prospective Diabetes Study (UKPDS) included 13 predictors [11,27]. In contrast, the variables in our study included FBS, HDL-c, use of antihypertensive therapy, antihyperlipidemic therapy, antithrombotic therapy, and heart failure. Since then, many other risk models have been developed. They differ in various aspects, including the types of populations, endpoints, and predictor variables, leading to widely varying risk estimates [25]. In our study, we incorporated stroke into the CVD outcome.
In consonance with the ASCVD risk calculator [11], we found that diabetes mellitus and low HDL-c levels are significantly associated with CVD. We found that participants on antihypertensive therapy were at higher risk of developing an event compared with those who were not. In contrast, UKPDS revealed that being on an antihypertensive medication decreases the risk of developing cardiovascular events in the general population.
Compared with the ASCVD risk calculator or Qrisk3 tool, which is currently used in Saudi Arabia, our tool assigns 11 points for FBS > 7.0 mmol/L and 13 points for HDL-c level < 1.03 mmol/L in our point system.
Additionally, to facilitate the use of this tool by clinicians, it can be converted into a program or an application. Further studies are needed to validate its accuracy and applicability among the Saudi Arabian population.
One limitation of our study is that it was conducted on a relatively small, restrictive, and narrow sample size and did not include all regions of Saudi Arabia, which may limit the generalizability of the results. Besides, our sample size estimation was based on a comparison between CVD in diabetic and non-diabetic groups, and the estimation of sample size could have been impacted by the choice of other risk factors, such as the presence and absence of hypertension or dyslipidemia. Additionally, the number of events was small, which could have likely impacted the true effect size and the power of analysis. The lack of external validation of our model to gauge its potential transferability to other cohorts of Saudi patients is a noteworthy limitation that should be acknowledged. An additional shortcoming is that our developed model was not compared to validated and generally accepted international risk score applications from Europe and America. Besides, out of the 6 independent risk factor variables that were included in the final prediction model, the use of the development of heart failure during the follow-up period as an explanatory variable suggests that this analysis is a time-dependent Cox proportional hazards model, and it would have been better to be analyzed as such. Lastly, this study identified antihypertension therapy as a CVD risk factor. However, the usefulness of using antihypertension therapy as a CVD risk factor prediction may be questionable, as antihypertensive use is so heterogeneous. For example, some use only 1 drug, some use more than 3 drugs, and some antihypertensive users have their blood pressure under control, whereas others do not.
However, the present study does add a valuable prediction model, as the sample size calculation was representative. The Omani and Australian tools were exclusively used for type 2 diabetes mellitus patients, and they had larger sample sizes than ours [15,28]. However, their follow-up period was 5 years, which is half of ours. Another limitation is the lack of documentation of the lifestyle history, such as diet and physical activity, that we cannot rule out its contribution in CVD. However, it is worth noting that many other international risk prediction tools also do not include lifestyle history in their models.
Clearly, the accuracy of risk estimation models will be negatively affected if the models are applied to populations different from the one they were derived from, or to the same population, but at a later time when significant changes in cardiovascular mortality may have occurred. In such situations, it becomes critically important to derive a new model from recent local cohorts of patient [29]. Cardiovascular risk assessment depends on risk factor profile, as well as average CVD risk in the specified population, and risk-factor levels in the population [17].
Further prospective cohort studies need to be developed in the future to better model our local population, with particular care to include a large population of older age and higher event rates. Following this, external validation is an essential step to ensure the transferability of the model-i.e., that it can be applied to other cohorts of patients, and not only the derivation cohort. The conclusion that may be drawn is that clinicians should think twice before applying commonly used CVD risk prediction equations for ASCVD risk stratification in specific populations.

Conclusions
As of today, no CVD risk prediction model has been locally developed in Saudi Arabia. Thus, in this research, we endeavored to develop the first 10-year CVD risk prediction model for 451 Saudi adults aged 18 to 75 years who attended the Family Medicine & Polyclinics Department at King Faisal Specialist Hospital and Research Centre (KFSH&RC) in Riyadh. Methodologically, the Cox regression model was used to identify the risk factors and establish the CVD risk prediction model. Key limitations of the CVD risk prediction presented model comprise the preliminary nature of the report, the small sample size, and all patients recruited from a single institute. Prospective research includes conducting large prospective cohort studies to better model our local population, followed by external validation studies to guarantee national transferability of the model to the larger population of Saudi Arabia. All in all, we believe that our CVD predication model has significant potential to be widely used in clinical practice after undergoing the validation study.